import scipy.io as sio
import numpy as np
import scipy.optimize
import matplotlib.pyplot as plt
Pavia = sio.loadmat('PaviaU_cube.mat')
HSI = Pavia['X'] #Pavia HSI : 300x200x103
ends = sio.loadmat('PaviaU_endmembers.mat') # Endmember's matrix: 103x9
endmembers = ends['endmembers']
fig = plt.figure()
plt.plot(endmembers)
plt.ylabel('Radiance values')
plt.xlabel('Spectral bands')
plt.title('9 Endmembers spectral signatures of Pavia University HSI')
plt.show()
#Perform unmixing for the pixels corresponding to nonzero labels
ground_truth= sio.loadmat('PaviaU_ground_truth.mat')
labels=ground_truth['y']
fig = plt.figure()
plt.imshow(HSI[:,:,102])
plt.title('RGB Visualization of the 10th band of Pavia University HSI')
plt.show()
# For the non-negative least squares unmixing algorithm you can use the nnls function, see the following link:
#https://docs.scipy.org/doc/scipy-0.18.1/reference/generated/scipy.optimize.nnls.html#scipy.optimize.nnls
# ..............
#
#..................
We will follow 5 different approaches in order to tackle the issue of spectral unmixing. We will perform
(103,9) where every row is a band of the signature and every column represents one endmember. $Y$ is a vector (103,1) where every row is one band of the spectral signature of the pixel that we wish to unmix. Finally, the vector $\theta$ contains the values of the linear combinations. Our model can be expressed in a linear way as: $X\theta + \epsilon= Y$ and our cost function can be expressed as Lasso: We will impose an $l_1$ penalizeation. The cost function will be modified as: $$argmin||X\theta - Y||_2^2 + \lambda \sum_{j=1}^{l}|\theta_j|$$
Adaptive Lasso: Here instead of implementing a traditional Lasso model, we will implement the Adaptive Lasso. Following what is proposed in Semi Supervised Hypespectral Unmixing Via the Weighted Lasso Konstantinos E. Themelis, Athanasios A. Rontogiannis and Konstantinos Koutroumbas we will impose weights on the coefficients, such that not all of them will be penalized by the same amount $\lambda$. As proposed, our weights will have an inverse effect to their corresponding $\theta_{lasso}$. We will use as weight the quantity $w = \frac{1}{|\theta_{ls}|}$ so that small values of $\theta_{ls}$ will penalize more the corresponding $\theta_{lasso}$ and large values of $\theta_{ls}$ will not penalize $\theta_{lasso}$ as much. This will introduce the desired additional sparsity. The optimization of our cost function is adjusted to:
$$argmin||X\theta - Y||_2^2 + \lambda \sum_{j=1}^{l}w_j|\theta_j| \ \ \textrm{where} \ \ w = \frac{1}{|\theta_{ls}|}$$
The technical implementation involves an iterative method for calculating the above quantites. We will iterate over the pixels only once and calculate all the aforementioned quantities. We will end up with 6 matrices, each one with dimension 300,200,9. For each element of each matrix, which is essential a pixel, we will store the values $\theta_{approach}$ of each approach (which are the values of the linear combination of the endmembers). For the Least Squares approach we will use the analytical solution, where for the constrained least squares we will use cvxopt. Finally, for Lasso we use sklear.linear_model.Lasso and for adaptive lasso we will use scipy.optimize.minimize.
In order to avoid optimizing the lambda for each pixel, we can sample some pixels in a stratified way in order to retain the same class distribution, and then calculate the optimal lambdas for those pixels. Then we could extract a statistic for an approximate value of the optimal lambda for each endmember.
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LassoCV
import pandas as pd
#concatenate the labels with each spectral signature
lambda_y = np.column_stack([HSI[labels!=0],labels[labels!=0]])
#sample 20% of the pixels, stratified on the endmembers class
sample,_ = train_test_split(lambda_y,train_size=0.20,stratify=lambda_y[:,-1])
X = endmembers
lambdas = []
#calculate the optimized lambdas for each pixel
for i,pixel in enumerate(sample):
lasso = LassoCV(cv=3,max_iter=20000,fit_intercept=False,random_state=123,tol=0.001).fit(X,pixel[:-1])
lambdas.append(lasso.alpha_)
#aggregate by each endmember using the median value of lambda
df_lambdas = pd.DataFrame(lambdas)
df_lambdas['label'] = sample[:,-1]
The median optimal values for the regularization parameter $\lambda$ are
optimal_lambdas = df_lambdas.groupby('label').median()
optimal_lambdas
After having a statistic of the regularization parameter $\lambda$ for each label, we will start calculating the unmixing parameters $\theta$
from cvxopt import matrix, solvers
from sklearn.linear_model import LassoCV, Lasso
import pandas as pd
from l1regls import l1regls
from scipy.optimize import minimize
solvers.options['show_progress'] = False
#endmembers are assigned as X
X = endmembers
# create matrices (300,200,9) for each approach
thetas_ls = np.empty((HSI.shape[0],HSI.shape[1],endmembers.shape[1]))
thetas_nn = np.empty((HSI.shape[0],HSI.shape[1],endmembers.shape[1]))
thetas_sum_one = np.empty((HSI.shape[0],HSI.shape[1],endmembers.shape[1]))
thetas_nn_sum_one = np.empty((HSI.shape[0],HSI.shape[1],endmembers.shape[1]))
thetas_lasso = np.empty((HSI.shape[0],HSI.shape[1],endmembers.shape[1]))
thetas_adaptive_lasso = np.empty((HSI.shape[0],HSI.shape[1],endmembers.shape[1]))
#loop through the pixels without flattening the array
for i in range(HSI.shape[0]):
for j in range(HSI.shape[1]):
#if the label == 0 proceed to the next pixel without processing
label = labels[i][j]
if label == 0:
continue
#Y is the spectral signature of each pixel
Y = HSI[i][j]
#Least Squares
thetas = np.linalg.inv(X.T@X) @ X.T @ Y
thetas_ls[i][j] = thetas
#Define our matrices for cvxopt
P = matrix(np.dot(X.T,X))
q = matrix(-(X.T @ Y))
b = matrix(1.0)
A = matrix(np.ones(X.shape[1]), (1,X.shape[1]))
G = matrix(-np.identity(X.shape[1]))
h = matrix(np.zeros(X.shape[1]))
#positive thetas
thetas_nn[i][j] = np.array(solvers.qp(P=P, q=q, G=G, h=h, A=None, b=None)['x']).flatten()
#sum to one
thetas_sum_one[i][j] = np.array(solvers.qp(P=P, q=q, G=None, h=None, A=A, b=b)['x']).flatten()
#positive thetas sum to one
thetas_nn_sum_one[i][j] = np.array(solvers.qp(P=P, q=q, G=G, h=h, A=A, b=b)['x']).flatten()
#retrieve the optimal lambda for each class
lambda_ = optimal_lambdas.iloc[label-1][0]
#Lasso
lasso = Lasso(fit_intercept=False,alpha=lambda_,max_iter=5000, tol=0.001).fit(X,Y)
thetas_lasso[i][j] = lasso.coef_
#Adaptive Lasso
# our weights are the inverse of the least squares coefficients calculated before
weights = 1/thetas_ls[i][j]
#initialize for thetas
theta_0 = np.zeros(X.shape[1])
#regularization parameter
#define the cost function of the lasso
lasso = lambda theta: (X@theta - Y).T @ (X@theta - Y) + lambda_ * np.sum(np.abs(weights * theta))
#minimize the cost function and assign to the respective pixel
thetas_adaptive_lasso_ = minimize(lasso, theta_0,method='L-BFGS-B' ,options={'disp': False},tol=0.0005)
thetas_adaptive_lasso[i][j] = thetas_adaptive_lasso_['x']
In order to understand better the performance of the models, as well as recognize endmembers where the model does not perform well, we can calculate the reconstruction per endmember.
#define a function calculating the reconstruction error
def reconstruction_error(X, thetas, Y):
squared_error = np.sum((thetas @ X.T - Y)**2, axis=1)
return squared_error
#keep the pixels which are labeled as non zero for calculating the reconstruction error
Y = HSI[np.where(labels!=0)]
#for each approach, calculate the recostruction error
reconstruction_error_ls = reconstruction_error(X, thetas_ls[np.where(labels!=0)], Y)
reconstruction_error_nn = reconstruction_error(X, thetas_nn[np.where(labels!=0)], Y)
reconstruction_error_sum_one = reconstruction_error(X, thetas_sum_one[np.where(labels!=0)], Y)
reconstruction_error_nn_sum_one = reconstruction_error(X, thetas_nn_sum_one[np.where(labels!=0)], Y)
reconstruction_error_lasso = reconstruction_error(X, thetas_lasso[np.where(labels!=0)], Y)
reconstruction_error_adaptive_lasso = reconstruction_error(X, thetas_adaptive_lasso[np.where(labels!=0)], Y)
#create a dataframe with the error of each pixel. we also create a column with the label of each pixel
error = pd.DataFrame({
'least_squares' : reconstruction_error_ls
,'least_squares_sum_one' : reconstruction_error_sum_one
,'least_squares_non_negative' : reconstruction_error_nn
,'least_squares_non_negative_sum_one': reconstruction_error_nn_sum_one
,'lasso' : reconstruction_error_lasso
,'adaptive_lasso' : reconstruction_error_adaptive_lasso
,'label' : labels[np.where(labels!=0)]
})
#create a dataframe with with the names of the labels
label_descriptions = pd.DataFrame({
'label' : [1,2,3,4,5,6,7,8,9]
,'description' : ['water', 'trees', 'asphalt', 'bricks', 'bitumen', 'tiles', 'shadows', 'meadows', 'bare soil']
})
#join with the names of the labels and calculate the mean recostruction error per label
error = pd.merge(error, label_descriptions, left_on='label' ,right_on='label')
agg = error.groupby('description').mean()
agg
#plot the reconstruction error per label
import seaborn as sns
sns.set(palette='deep', color_codes=True, font_scale=1.3)
fig, axs = plt.subplots(3,3, figsize = (22,18))
axs = axs.ravel()
fig.subplots_adjust(hspace=0.4, wspace=0.5)
fig.suptitle('Mean Reconstruction Error of different Models')
for i in range(len(agg.T.columns)):
sns.barplot(agg.T.iloc[:,i],agg.T.index, ax=axs[i])
axs[i].set(yticklabels=['LS', 'Sum_1', 'NN', 'Sum_1 NN', 'Lasso', 'Adaptive Lasso'])
axs[i].title.set_text('Pixels labeled as: ' + agg.T.columns[i])
axs[i].set_ylabel('')
axs[i].set_xlabel('Mean Reconstruction Error')
axs[i].ticklabel_format(style='sci', axis='x', scilimits=(0,0))
Observations:
Later while plotting the ambundance maps, we will be able to correlate the recostruction error plots with the quality of the ambundance maps.
def ambundance_map(label):
fig, axs = plt.subplots(2,4, figsize=(18,10))
plt.subplots_adjust(wspace=0.3)
cmap = sns.diverging_palette(220, 20, as_cmap=True)
axs[0][0].imshow((labels == label).astype(int), cmap=cmap)
axs[0][0].grid(False)
axs[0][0].title.set_text('Ground Truth')
axs[0][1].imshow(thetas_ls[:,:,label-1], cmap=cmap)
axs[0][1].grid(False)
axs[0][1].title.set_text('1. Least Squares')
axs[0][2].imshow(thetas_sum_one[:,:,label-1], cmap=cmap)
axs[0][2].grid(False)
axs[0][2].title.set_text('2. LS $\sum θ=1$')
axs[0][3].imshow(thetas_nn[:,:,label-1], cmap=cmap)
axs[0][3].grid(False)
axs[0][3].title.set_text('3. LS $θ>0$')
axs[1][0].imshow(thetas_nn_sum_one[:,:,label-1], cmap=cmap)
axs[1][0].grid(False)
axs[1][0].title.set_text('4. LS $θ>0$ and $\sum θ=1$')
axs[1][1].imshow(thetas_lasso[:,:,label-1], cmap=cmap)
axs[1][1].grid(False)
axs[1][1].title.set_text('5. Lasso')
axs[1][2].imshow(thetas_adaptive_lasso[:,:,label-1], cmap=cmap)
axs[1][2].grid(False)
axs[1][2].title.set_text('6. Adaptive Lasso')
axs[1][3].axis('off')
plt.show()
ambundance_map(1)
Regarding the water ambundance map, we can clearly see that the models 3 and 4 perform really well. The model 1 and 2 do not perform so well. The model 5 (Lasso) is able to identify the water trail but with lots of noise around it. Model 6 has imposed some additional sparsity but it contains noise as well
ambundance_map(2)
3 and 4 are able to identify Trees with a lot of noise coming from Tiles1 and 2 still underperform. They identify the trees with lots of additional noise5 was able to supress much of the noise but still suffers from noise coming from Tiles6, which even if it imposed additional sparsity, there is additional noise coming from tilesambundance_map(3)
3 and 4 with lots of additional noise though5 contains lots of noise coming from other endmembers6 has managed to impose sparsity and unmix a considerable amount of asphaltambundance_map(4)
Here all models perform well
ambundance_map(5)
All models are able to identify Bitumen. There some additional of noise in model 4. Both lasso and adaptive lasso has managed to supress it.
ambundance_map(6)
3 and 4. Model 4 containes some additional noise coming from trees.1 and 2 are noisy5 has a clearer outline of the shape of the tiles6 imposed greater sparsity and unmixed tiles more vividlyambundance_map(7)
4 performs well5 was not able to capture shadows at all2 as well as model 6 were able to unmix shadows but with much additional noiseambundance_map(8)
Model 4 together with 5 are able to identify meadows. There is additional noise coming from asphalt
ambundance_map(9)
Here models 4 and 5 are able to identify bare soil
# Trainining set for classification
Pavia_labels = sio.loadmat('classification_labels_Pavia.mat')
Training_Set = (np.reshape(Pavia_labels['training_set'],(200,300))).T
Test_Set = (np.reshape(Pavia_labels['test_set'],(200,300))).T
Operational_Set = (np.reshape(Pavia_labels['operational_set'],(200,300))).T
fig = plt.figure()
plt.imshow(Training_Set)
plt.title('Labels of the pixels of the training set')
plt.show()
X_train = HSI[np.where(Training_Set>0)]
y_train = Training_Set[np.where(Training_Set>0)]
X_test = HSI[np.where(Test_Set>0)]
y_test = Test_Set[np.where(Test_Set>0)]
We will plot the correlation map for each class in order to get an idea of how well the classifiers will perform
cmap = sns.diverging_palette(220, 20, as_cmap=True)
fig, axs = plt.subplots(3,3, figsize=(18,12))
axs = axs.ravel()
plt.subplots_adjust(hspace=0.4)
fig.suptitle('Correlation of Spectral Bands for each class')
for i in range(9):
x_class = X_train[y_train==i+1]
corr = np.corrcoef(x_class.T)
mask = np.triu(np.ones_like(corr))
sns.heatmap(corr, vmin=-1, vmax=1, cmap=cmap, center=0, square=True, cbar_kws={"shrink": .5}, ax=axs[i])
axs[i].title.set_text('Class: ' + label_descriptions.iloc[i,1])
For every class, we observe very high correlations between the bands of each signature. This might harm models that assume diagonal covariance matrices such as the Naive Bayes and the Minimum Euclidean Distance Classifier.
def cross_validation(X_train, y_train, folds, classifier):
'''
Performs k fold cross validation and returns statistics.
Parameters:
X_train (array): the training dataset
y_train (array): the labels of the training dataset
folds (int): the number of folds of the cross validation
classifier: initialized object of a classifier.
It needs to contain a fit and predict method
Returns:
mean_validation_error (float): the mean validation error after the k folds
standard_deviation (float): the standard deviation of the mean validation error
validation_errors (array): the validation error for each fold
'''
#set a random seed
np.random.seed(123)
#create a dataframe for easier manipulation
df = pd.DataFrame(X_train)
df['y'] = y_train
#create a column with random integers, shuffling the data
df['shuffle'] = np.random.randint(0,1000,len(X_train))
#qcut() function from pandas, partition the dataset to k equal subsets, label them as 1 to folds
df['partition'] = pd.qcut(df.shuffle, folds, labels=range(1,folds+1))
validation_errors = []
for i in range(1, folds+1):
#for each fold keep the subset that is not equal to the fold as train
x_train = df[df.partition != i].iloc[:,0:103]
y_train = df[df.partition != i].y
#keep the validation that is equal to the fold as validation
x_validation = df[df.partition == i].iloc[:,0:103]
y_validation = df[df.partition == i].y
#fit on the train subset
classifier.fit(x_train, y_train)
#predict the x_validation subset
predictions = classifier.predict(x_validation)
#calculate the classification error
error = np.sum(y_validation != predictions) / len(y_validation)
validation_errors.append(error)
res = {
'mean_validation_error' : np.mean(validation_errors)
,'standard_deviation_error' : np.std(validation_errors)
,'validation_errors' : validation_errors
}
return res
from sklearn.naive_bayes import GaussianNB
naive_bays_results = cross_validation(X_train, y_train, 10, GaussianNB())
naive_bays_results
Here we will define a custom classifier in order to be able to pass it through our cross validation function. We will need to imlpement the methods fit and predict in accordance with the sklearn classifiers
class MinimumEuclideanDistanceClassifier:
def __init__(self, classes):
self.classes = classes
self.means = None
def fit(self, X, y):
'''
Calculates the means of each class. It creates a dataframe object,
attaches the labels as a column in order to aggragate the features on them
Parameters:
X (array): array of features
Y (array): labels of observations
Returns:
MinimumEuclideanDistanceClassifier
'''
df = pd.DataFrame(X)
df['y'] = y
means = df.groupby('y').mean()
self.means = np.array(means)
return self
def predict(self, X):
'''
Using broadcasting for avoiding loops, assigns each observation to the closest class
For more info on broadcasting please check
https://numpy.org/doc/stable/user/basics.broadcasting.html
Parameters:
X (array): array of observations to predict
Returns:
predictions (array): Vector of predictions
'''
X = np.array(X)
distances = np.sum((X[:,np.newaxis,:] - self.means)**2, axis=2)
predictions = np.argmin(distances, axis=1) + 1
return predictions
ed_res = cross_validation(X_train, y_train, 10, MinimumEuclideanDistanceClassifier(classes=9))
ed_res
#perform cross validation in order to tune the K hyperparameter
from sklearn.neighbors import KNeighborsClassifier
neighbors = {}
for k in range(3,10):
knn = KNeighborsClassifier(n_neighbors=k)
knn_res = cross_validation(X_train, y_train, 10, knn)
neighbors[k] = knn_res['mean_validation_error']
neighbors
# validation only with 7 nearest neighbors
knn_res = cross_validation(X_train, y_train, 10, KNeighborsClassifier(n_neighbors=7))
knn_res
from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis
qda = QuadraticDiscriminantAnalysis(store_covariance=True)
qda_res = cross_validation(X_train, y_train, 10, qda)
qda_res
#create a datafrae with all the validation error
errors = pd.DataFrame({
'fold' : range(1,11)
,'naive_bayes': naive_bays_results['validation_errors']
,'min_eucl_dist' : ed_res['validation_errors']
,'knn' : knn_res['validation_errors']
,'bayes_classifier': qda_res['validation_errors']
})
errors
import seaborn as sns
sns.set(palette='deep', color_codes=True, font_scale=1.3)
errors = pd.melt(errors, id_vars='fold', var_name='model', value_name='validation_error')
#plot the validation errors over the folds
fig = plt.figure(figsize=(14,10))
gs = fig.add_gridspec(2, 2)
plt.subplots_adjust(hspace=0.4)
ax1 = fig.add_subplot(gs[0, :])
ax1.title.set_text('Validation Error per Fold')
sns.lineplot(errors.fold, errors.validation_error, hue=errors.model)
ax1.set_ylabel('Validation Errors')
#aggregate the erros per model
agg = errors.groupby(['model']).agg(
mean_validation_error = ('validation_error', np.mean)
,standard_deviation_of_error = ('validation_error', np.std)
)
agg.sort_values('mean_validation_error', inplace=True, ascending=False)
palette = [sns.color_palette()[1], sns.color_palette()[0], sns.color_palette()[2], sns.color_palette()[3]]
ax2 = fig.add_subplot(gs[1 , 0])
sns.barplot(agg.index, agg.mean_validation_error, palette=palette)
ax2.title.set_text('Mean Validation Error')
ax2.set_ylabel('')
ax2.set_xlabel('')
ax3 = fig.add_subplot(gs[1 , 1])
sns.barplot(agg.index, agg.standard_deviation_of_error, palette=palette)
ax3.title.set_text('Standard Deviation of Validation Error')
ax3.set_ylabel('')
ax3.set_xlabel('')
plt.show()
A few Observations:
gnb = GaussianNB().fit(X_train, y_train)
med = MinimumEuclideanDistanceClassifier(9).fit(X_train, y_train)
knnc = KNeighborsClassifier(n_neighbors=7).fit(X_train, y_train)
bc = QuadraticDiscriminantAnalysis(store_covariance=True).fit(X_train, y_train)
gnb_predictions = gnb.predict(X_test)
med_predictions = med.predict(X_test)
knnc_predictions = knnc.predict(X_test)
bc_predictions = bc.predict(X_test)
from sklearn.metrics import confusion_matrix, f1_score, accuracy_score
def plot_confusion_matrix(confusion_matrix, classifier):
fig, axs = plt.subplots(figsize=(12,8))
axs =sns.heatmap(
confusion_matrix
,annot=True
,cmap='Blues'
,linewidths=0.1
,xticklabels= label_descriptions.description
,yticklabels = label_descriptions.description
,ax=axs
,fmt = '.2f'
,cbar_kws = {'shrink':0.5}
)
axs.set_xlabel('predicted')
axs.set_ylabel('actual')
axs.title.set_text('Confusion Matrix of ' + classifier + ' classifier')
plt.show()
gnb_conf = confusion_matrix(y_test, gnb_predictions, normalize='true')
plot_confusion_matrix(gnb_conf, 'Naive Bayes')
print('accuracy:', accuracy_score(y_test, gnb_predictions))
print('f1 score:', f1_score(y_test, gnb_predictions, average='weighted'))
The accuracy as well as the f1 score are not ideal for the Naive Bayes classifier. More particularly it is not able to capture the endmemers of:
As an example, we will plot below the means of Water and Shadow, which are used for the Naive Bayes model. On top, we will plot all the datapoints classified correctly as water, as well as the datapoints missclassified as shadows, even though they were water. We can observe that the spectral signatures of the missclassified datapoints is much closer to the mean values of the shadows.
fig, axs = plt.subplots(figsize=(12,8))
plt.plot(X_test[np.where((y_test==1) & (gnb_predictions==7))].T, color='orange', alpha=0.1)
plt.plot(X_test[np.where((y_test==1) & (gnb_predictions==7))][0], color='orange', alpha=0.5,label='Water Missclassified as Shadow')
plt.plot(X_test[np.where((y_test==1) & (gnb_predictions==1))].T, color='green', alpha=0.1)
plt.plot(X_test[np.where((y_test==1) & (gnb_predictions==1))][0], color='green', alpha=0.5, label='Water Classified as Water')
plt.plot(gnb.theta_[0], label='Mean of Water')
plt.plot(gnb.theta_[6], label='Mean of Shadow')
plt.legend()
plt.title('Naive Bayes Means and Missclassified Points')
plt.show()
med_conf = confusion_matrix(y_test, med_predictions, normalize='true')
plot_confusion_matrix(med_conf, 'Minimum Euclidean Distance')
print('accuracy:', accuracy_score(y_test, med_predictions))
print('f1 score:', f1_score(y_test, med_predictions, average='weighted'))
The minimum Euclidean Distance Classifier underperforms. There are high percentages in the off diagonal part of the confusion matrix.
knn_conf = confusion_matrix(y_test, knnc_predictions, normalize='true')
plot_confusion_matrix(knn_conf, 'KNN')
print('accuracy:', accuracy_score(y_test, knnc_predictions))
print('f1 score:', f1_score(y_test, knnc_predictions, average='weighted'))
The KNN classifier achieves a good enough accuracy with high elements on the diagonal. There is sligh drop regarding the water endmember which it confuses with shadow and meadows mostly.
bc_conf = confusion_matrix(y_test, bc_predictions, normalize='true')
plot_confusion_matrix(bc_conf, 'Bayesian')
print('accuracy:', accuracy_score(y_test, bc_predictions))
print('f1 score:', f1_score(y_test, bc_predictions, average='weighted'))
The Bayesian classifier performs good as well. There are three major flaws:
In order to be able to quickly spot any weaknesses of our classifiers, we can plot a combined confusion matrix. The matrix is normalized under the prism of recall.
from matplotlib.lines import Line2D
def confusion_to_df(confusion_matrix,classifier):
df = pd.DataFrame(confusion_matrix)
df.index.name = 'actual'
df = df.reset_index()
df = pd.melt(df,id_vars='actual', var_name='predicted',value_name='class_rate')
df['classifier'] = classifier
return df
lmed = confusion_to_df(med_conf,'min_eucl_dist')
lnb = confusion_to_df(gnb_conf,'naive_bayes')
lknn = confusion_to_df(knn_conf, 'knn')
lbc = confusion_to_df(bc_conf, 'bayes_classifier')
lconf = pd.concat([lmed,lnb,lknn,lbc])
fig, axs = plt.subplots(9,9, figsize = (14,14),sharey=True, sharex=True)
palette = [sns.color_palette()[1], sns.color_palette()[0], sns.color_palette()[2], sns.color_palette()[3]]
for i in range(9):
for j in range(9):
temp = lconf[(lconf.actual==i) & (lconf.predicted==j)]
sns.barplot(temp.classifier, temp.class_rate,ax=axs[i][j], palette=palette,)
axs[i][j].set_xlabel('')
axs[i][j].set_ylabel('')
if j == 0:
axs[i][j].set_ylabel(label_descriptions.iloc[i,1])
if i== 8:
axs[i][j].set_xlabel(label_descriptions.iloc[j,1])
axs[i][j].tick_params(labelbottom=False)
plt.suptitle('Confusion Matrices of all Classifiers Combined')
legend_elements = [Line2D([0], [0], color=palette[0], lw=4, label='Min Euclidean Distance')
,Line2D([0], [0], color=palette[1], lw=4, label='Naive Bayes')
,Line2D([0], [0], color=palette[2], lw=4, label='KNN')
,Line2D([0], [0], color=palette[3], lw=4, label='Bayesian Classifier')]
plt.legend(handles=legend_elements,bbox_to_anchor=(4, 5))
plt.show()
Observations:
Min Euclidean Distance Classifier has some trouble identifying trees which it confuses with tiles. The same issue appears for the rest of the classifiers as well, but not in the same degreeNaive Bayes and Min Euclidean Distance as it is getting missclassified as meadows or water (much less though). This is also the case for the rest of the classifiers but not to the same extent.Min Euclidean DistanceMin Euclidean Distance and Naive Bayes. The Bayesian Classifier as well as KNN do not seem to have any issues classifying them correctly.Next we can compare the f1 score, which combines the precision and recall of each classifier. It is calculates as the weighted average of the f1 score of each class
print('f1 score Naive Bayes:', f1_score(y_test, gnb_predictions, average='weighted'))
print('f1 score Minimum Euclidean Distance:', f1_score(y_test, med_predictions, average='weighted'))
print('f1 score KNN:', f1_score(y_test, knnc_predictions, average='weighted'))
print('f1 score Bayesian Classifier:', f1_score(y_test, bc_predictions, average='weighted'))
We almost have a tie between the Bayesian Classifier and the KNN classifier. The next step would be to plot something similar to the ambundance maps for the 4 classifiers.
def classification_map(class_):
fig, axs = plt.subplots(1,5,figsize=(20,12))
axs = axs.ravel()
predictions = [
gnb_predictions
,med_predictions
,knnc_predictions
,bc_predictions]
titles = ['Naive Bayes', 'Min Euclidean Distance', 'KNN', 'Bayesian Classifier']
axs[0].imshow(Test_Set==class_,cmap='binary')
axs[0].grid(False)
axs[0].title.set_text('Ground Truth')
for i in range(0,4):
map_ = np.zeros(HSI.shape[:2])
map_[np.where(Test_Set!=0)] = predictions[i]
axs[i+1].imshow(map_==class_, cmap='binary')
axs[i+1].grid(False)
axs[i+1].title.set_text(titles[i])
plt.show()
classification_map(1)
All classifiers identify water with with lots of additional noise. This was also visible from the confusion matrices of the classifiers. The KNN and the Bayesian Classifier perform slightly better though.
classification_map(2)
KNN and the Bayesian Classifier achieve the best performance
classification_map(3)
Here the Bayesian Classifier as well as KNN are able to identify asphalt with lots of additional noise
classification_map(4)
Bricks are robustly identified by all classifiers
classification_map(5)
classification_map(6)
Here the KNN and the Bayesian Classifier achieve the best results
classification_map(7)
classification_map(8)
classification_map(9)
Overall, the Bayesian classifier as well as the KNN classifier have managed to indentify the endmembers with an acceptable performance. Naive Bayes and Minimum Euclidean Distance did not manage to perform equally well
In both the spectral unmixing part, as well as the classification part, the models were not able to discriminate water. The Success Rate of the classifers were around 60% and the some of the ambundance maps unmixed water but with lots of additional noise.
Looking back at the ambundance maps, we can see that even though trees are correctly unmixed, there is lots of noise coming from tiles as well. This is common for all the linear models used. Someting similar is also observed from the combinded confusion matrix, as well as the classification maps.
The opposite issue, having Tiles as the ground truth is also present. We observed some missclassifications of the Naive Bayes and the Minimum Euclidean Distance classifiers. Additionally, spectral unmixing was not very robust for this category. It seems that a Bayesian Classifier achieved the best results from all of our approaches.
This systematic "failure" could raise some issues regarding the ability in general to discriminate between trees and tiles
This is another common issue between the linear models as well as the classifiers. We do observe the inability of both approaches to discriminate asphalt as there is additional noise coming from Meadows.
From the opposite direction, having Meadows are the ground truth, we observe the same results. There is lots of noise, in both spectral unmixing and classification introduced by asphalt.
For those three endmembers, both approaches achieved high discrimination power.
Shadows were able to be unmixed by the fully contrained liner model and classified properly from most of our classifiers
From this analysis, when it comes to spectral unmixing, the best model was the Least Squares model subject to $\sum \theta = 1, \ \ \theta > 0$. Even though it had the largest reconstruction error (which is natural due to its constrains) it was able to unmix the endmembers properly. Regarding the classification, there is a tie between the KNN and the fully parametric Bayesian Classifier. Personally, I would choose the Bayesian Classifier since it requires much less resources for training and predicting, considering always the size of the dataset (usually a pixel contains enough datapoints for calculating the covariances of the different classes)